EDRAK: Entity-Centric Data Resource for Arabic Knowledge
نویسندگان
چکیده
Online Arabic content is growing very rapidly, with unmatched growth in Arabic structured resources. Systems that perform standard Natural Language Processing (NLP) tasks such as Named Entity Disambiguation (NED) struggle to deliver decent quality due to the lack of rich Arabic entity repositories. In this paper, we introduce EDRAK, an automatically generated comprehensive Arabic entity-centric resource. EDRAK contains more than two million entities together with their Arabic names and contextual keyphrases. Manual evaluation confirmed the quality of the generated data. We are making EDRAK publicly available as a valuable resource to help advance research in Arabic NLP and IR tasks such as dictionary-based NamedEntity Recognition, entity classification, and entity summarization.
منابع مشابه
AIDArabic+ Named Entity Disambiguation for Arabic Text
Named Entity Disambiguation (NED) is the problem of mapping mentions of ambiguous names in a natural language text onto canonical entities such as people or places, registered in a knowledge base. Recent advances in this field enable semantically understanding content in different types of text. While the problem had been extensively studied for the English text, the support for other languages...
متن کاملRetrieval, Crawling and Fusion of Entity-centric Data on the Web
While the Web of (entity-centric) data has seen tremendous growth over the past years, take-up and re-use is still limited. Data vary heavily with respect to their scale, quality, coverage or dynamics, what poses challenges for tasks such as entity retrieval or search. This chapter provides an overview of approaches to deal with the increasing heterogeneity of Web data. On the one hand, recomme...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA global Entity Name System (ENS) for data ecosystems
After decades of schema-centric research on data management and integration, the evolution of data on the web and the adoption of resource-based models seem to have shifted the focus towards an entity-centric approach. Our thesis is that the missing element to achieve the full potential of this approach is the development of what we call an Entity Name System (ENS), namely a system which provid...
متن کامل